mcr 2
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
To All Reviewers: We thank all reviews for your insightful feedback and your appreciation of our MCR 2 formulation
We will incorporate suggestions on minor corrections, references, footnotes, and presentations in the final version. This work aims to introduce a new objective (i.e., MCR Compare with OLE: " 1). It is not clear why a larger . . . Does the OLE type loss have the same property as Theorem 1? 3). The authors should show more comparison . . . We will make these connections more clear in the final version; 2). Q2: Gaussian assumption of data: "I have a concern whether the rate distortion function . . . to be self-contained.
A Global Geometric Analysis of Maximal Coding Rate Reduction
Wang, Peng, Liu, Huikang, Pai, Druv, Yu, Yaodong, Zhu, Zhihui, Qu, Qing, Ma, Yi
The maximal coding rate reduction (MCR$^2$) objective for learning structured and compact deep representations is drawing increasing attention, especially after its recent usage in the derivation of fully explainable and highly effective deep network architectures. However, it lacks a complete theoretical justification: only the properties of its global optima are known, and its global landscape has not been studied. In this work, we give a complete characterization of the properties of all its local and global optima, as well as other types of critical points. Specifically, we show that each (local or global) maximizer of the MCR$^2$ problem corresponds to a low-dimensional, discriminative, and diverse representation, and furthermore, each critical point of the objective is either a local maximizer or a strict saddle point. Such a favorable landscape makes MCR$^2$ a natural choice of objective for learning diverse and discriminative representations via first-order optimization methods. To validate our theoretical findings, we conduct extensive experiments on both synthetic and real data sets.
Learning Visual-Semantic Subspace Representations for Propositional Reasoning
Moreira, Gabriel, Hauptmann, Alexander, Marques, Manuel, Costeira, João Paulo
Learning representations that capture rich semantic relationships and accommodate propositional calculus poses a significant challenge. Existing approaches are either contrastive, lacking theoretical guarantees, or fall short in effectively representing the partial orders inherent to rich visual-semantic hierarchies. In this paper, we propose a novel approach for learning visual representations that not only conform to a specified semantic structure but also facilitate probabilistic propositional reasoning. Our approach is based on a new nuclear norm-based loss. We show that its minimum encodes the spectral geometry of the semantics in a subspace lattice, where logical propositions can be represented by projection operators.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.48)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.34)
Justices for Information Bottleneck Theory
Cao, Faxian, Cheng, Yongqiang, Khan, Adil Mehmood, Yang, Zhijing
This study comes as a timely response to mounting criticism of the information bottleneck (IB) theory, injecting fresh perspectives to rectify misconceptions and reaffirm its validity. Firstly, we introduce an auxiliary function to reinterpret the maximal coding rate reduction method as a special yet local optimal case of IB theory. Through this auxiliary function, we clarify the paradox of decreasing mutual information during the application of ReLU activation in deep learning (DL) networks. Secondly, we challenge the doubts about IB theory's applicability by demonstrating its capacity to explain the absence of a compression phase with linear activation functions in hidden layers, when viewed through the lens of the auxiliary function. Lastly, by taking a novel theoretical stance, we provide a new way to interpret the inner organizations of DL networks by using IB theory, aligning them with recent experimental evidence. Thus, this paper serves as an act of justice for IB theory, potentially reinvigorating its standing and application in DL and other fields such as communications and biomedical research.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
Sample-efficient Quantum Born Machine through Coding Rate Reduction
The quantum circuit Born machine (QCBM) is a quantum physics inspired implicit generative model naturally suitable for learning binary images, with a potential advantage of modeling discrete distributions that are hard to simulate classically. As data samples are generated quantum-mechanically, QCBMs encompass a unique optimization landscape. However, pioneering works on QCBMs do not consider the practical scenario where only small batch sizes are allowed during training. QCBMs trained with a statistical two-sample test objective in the image space require large amounts of projective measurements to approximate the model distribution well, unpractical for large-scale quantum systems due to the exponential scaling of the probability space. QCBMs trained adversarially against a deep neural network discriminator are proof-of-concept models that face mode collapse. In this work we investigate practical learning of QCBMs. We use the information-theoretic \textit{Maximal Coding Rate Reduction} (MCR$^2$) metric as a second moment matching tool and study its effect on mode collapse in QCBMs. We compute the sampling based gradient of MCR$^2$ with respect to quantum circuit parameters with or without an explicit feature mapping. We experimentally show that matching up to the second moment alone is not sufficient for training the quantum generator, but when combined with the class probability estimation loss, MCR$^2$ is able to resist mode collapse. In addition, we show that adversarially trained neural network kernel for infinite moment matching is also effective against mode collapse. On the Bars and Stripes dataset, our proposed techniques alleviate mode collapse to a larger degree than previous QCBM training schemes, moving one step closer towards practicality and scalability.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Federated Representation Learning via Maximal Coding Rate Reduction
Cervino, Juan, NaderiAlizadeh, Navid, Ribeiro, Alejandro
We propose a federated methodology to learn low-dimensional representations from a dataset that is distributed among several clients. In particular, we move away from the commonly-used cross-entropy loss in federated learning, and seek to learn shared low-dimensional representations of the data in a decentralized manner via the principle of maximal coding rate reduction (MCR2). Our proposed method, which we refer to as FLOW, utilizes MCR2 as the objective of choice, hence resulting in representations that are both between-class discriminative and within-class compressible. We theoretically show that our distributed algorithm achieves a first-order stationary point. Moreover, we demonstrate, via numerical experiments, the utility of the learned low-dimensional representations.
New Hard-thresholding Rules based on Data Splitting in High-dimensional Imbalanced Classification
Mojiri, Arezou, Khalili, Abbas, Hamadani, Ali Zeinal
In binary classification, imbalance refers to situations in which one class is heavily under-represented. This issue is due to either a data collection process or because one class is indeed rare in a population. Imbalanced classification frequently arises in applications such as biology, medicine, engineering, and social sciences. In this paper, for the first time, we theoretically study the impact of imbalance class sizes on the linear discriminant analysis (LDA) in high dimensions. We show that due to data scarcity in one class, referred to as the minority class, and high-dimensionality of the feature space, the LDA ignores the minority class yielding a maximum misclassification rate. We then propose a new construction of hard-thresholding rules based on a data splitting technique that reduces the large difference between the misclassification rates. We show that the proposed method is asymptotically optimal. We further study two well-known sparse versions of the LDA in imbalanced cases. We evaluate the finite-sample performance of different methods using simulations and by analyzing two real data sets. The results show that our method either outperforms its competitors or has comparable performance based on a much smaller subset of selected features, while being computationally more efficient.
- North America > United States (0.14)
- North America > Canada > Quebec > Montreal (0.04)